Crate quantiles

Source
Expand description

This crate provides approximate quantiles over data streams in a moderate amount of memory.

Order statistics is a rough business. Exact solutions are expensive in terms of memory and computation. Recent literature has advanced approximations but each have fundamental tradeoffs. This crate is intended to be a collection of approximate algorithms that provide guarantees around space consumption.

Modules§

  • This is an implementation of the algorithm presented in Cormode, Korn, Muthukrishnan, Srivastava’s paper “Effective Computation of Biased Quantiles over Data Streams”. The ambition here is to approximate quantiles on a stream of data without having a boatload of information kept in memory.
  • Greenwald Khanna calculates epsilon-approximate quantiles. If the desired quantile is phi, the epsilon-approximate quantile is any element in the range of elements that rank between lbound((phi-epsilon) x N) and lbound((phi+epsilon) x N)
  • ‘histogram’ approximates a distribution calculation by counting the number of times samples fall into pre-configured bins. This implementation does not require bins to be equally sized. The user must specify upper bounds on bins via Bounds. The implementation includes a +Inf bound automatically.
  • Misra-Gries calculates an ε-approximate frequency count for a stream of N elements. The output is the k most frequent elements.